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Abstract 

Approaches from statistical physics are applied to investigate the structure of network models 
whose growth rules mimic aspects of the evolution of the world-wide web. We first determine 
the degree distribution of a growing network in which nodes are introduced one at a time and 
attach to an earlier node of degree k with rate ~ fc 7 . Very different behaviors arise for 
7 < 1, 7 = f , and 7 > 1. We also analyze the degree distribution of a heterogeneous network, 
the joint age-degree distribution, the correlation between degrees of neighboring nodes, as well 
as global network properties. An extension to directed networks is then presented. By tuning 
model parameters to reasonable values, we obtain distinct power-law forms for the in-degree 
and out-degree distributions with exponents that are in good agreement with current data for 
the web. Finally, a general growth process with independent introduction of nodes and links is 
investigated. This leads to independently growing sub-networks that may coalesce with other 
sub-networks. General results for both the size distribution of sub-networks and the degree 
distribution are obtained. 



1 Introduction 

With the recent appearance of the Internet and the world-wide web, understanding the properties 
of growing networks with popularity-based construction rules has become an active and fruitful 
research area In such models, newly-introduced nodes preferentially attach to pre-existing 
nodes of the network that are already "popular". This leads to graphs whose structure is quite 
different from the well-known random graph ^] in which links are created at random between 
nodes without regard to their popularity. This discovery of a new class of graph theory problems 
has fueled much effort to characterize their properties. 

One basic measure of the structure of such networks is the node degree defined as the number 
of nodes in the network that are linked to k other nodes. In the case of the random graph, the 
node degree is simply a Poisson distribution. In contrast, many popularity-driven growing networks 
have much broader degree distributions with a stretched exponential or a power-law tail. The latter 
form means that there is no characteristic scale for the node degree, a feature that typifies many 
networked systems 

Power laws, or more generally, distributions with highly skewed tails, characterize the degree 
distributions of many man-made and naturally occurring networks [Q]. For example, the degree 
distributions at the level of autonomous systems and at the router level exhibit highly skewed tails 
J|, [5], p. Other important Internet-based graphs, such as the hyperlink graph of the world-wide web 
also appear to have a degree distribution with a power- law tail [|, g H 0, [n]. These observations 



have spurred a flurry of recent work to understand the underlying mechanisms for these phenomena. 
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A related example with interest to anyone who publishes, is the distribution of scientific cita- 
tions u, @> 0- Here 

one treats publications as nodes and citations as links in a citation graph. 
Currently-available data suggests that the citation distribution has a power-law tail with an asso- 



ciated exponent close to —3 [14]. As we shall see, this exponent emerges naturally in the Growing 
Network (GN) model where the relative probability of linking from a new node to a previous node 
(equivalent to citing an earlier paper) is strictly proportional to the popularity of the target node. 

In this paper, we apply tools from statistical physics, especially the rate equation approach, 
to quantify the structure of growing networks and to elucidate the types of geometrical features 
that arise in networks with physically-motivated growth rules. The utility of the rate equations 
has been demonstrated in a diverse range of phenomena in non-equilibrium statistical physics, 



such as aggregation |[||, coarsening [16|, and epitaxial surface growth |L7]]. We will attempt to 
convince the reader that the rate equations are also a simple yet powerful analysis tool to analyze 
growing network systems. In addition to providing comprehensive information about the node 
degree distribution, the rate equations can be easily adapted to analyze both heterogeneous and 
directed networks, the age distribution of nodes, correlations between node degrees, various global 
network properties, as well as the cluster size distribution in models that give rise to independently 
evolving sub-networks. Thus the rate equation method appears to be better suited for probing the 
structure of growing networks compared to the classical approaches for analyzing random graphs, 
such as probabilistic M or generating function Q techniques. 

In the next section, we introduce three basic models that will be the focus of this review. In 
the following three sections, we then present rate equation analyses to determine basic geometrical 
properties of these networks. We close with a brief summary. 

2 Models 

The models we study appear to embody many of the basic growth processes in web graphs and 
related systems. These include: 

• The Growing Network (GN) [||, |lq| . Nodes are added one at a time and a single link is estab- 
lished between the new node and a pre-existing node according to an attachment probability 
At that depends only on the degree of the "target" node (Fig. g). 




Figure 1: Growing network. Nodes are added sequentially and a single link joins a new node to an 
earlier node. Node 1 has (total) degree 5, node 2 has degree 3, nodes 4 and 6 have degree 2, and 
the remaining nodes have degree 1. 



The Web Graph (WG). This represents an extension of the GN to incorporate link direction- 



ality [ 19 1 and leads to independent, dynamically generated in-degree and out-degree distri- 
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butions. The network growth occurs by two distinct processes pf that are meant to mimic 
how hyperlinks are created in the web (Fig. |||): 



(i) With probability p, a new node is introduced and it immediately attaches to an earlier 
target node. The attachment probability depends only on the in-degree of the target. 

(ii) With probability q = 1 — p, a new link is created between already existing nodes. The 
choices of the originating and target nodes depend on the out-degree of the former and 
the in-degree of the latter. 




(i) (ii) 



Figure 2: Growth processes in the web graph model: (i) node creation and immediate attachment, 
and (ii) link creation. In (i) the new node is shaded, while in both (i) and (ii) the new link is 
dashed. 



• The Multicomponent Graph (MG). Nodes and links are introduced independently [21]. (i) 
With probability p, a new unlinked node is introduced, while (ii) with probability q = 1 — p, a 
new link is created between existing nodes. As in the WG, the choices of the originating and 
target nodes depend on the out-degree of the former and the in-degree of the latter. Step (i) 
allows for the formation of many clusters. 

3 Structure of the Growing Network 

Because of its simplicity, we first study the structure of the GN ||, 18]. The basic approaches 
developed in this section will then be extended to the WG and MG models. 

3.1 Degree Distribution of a Homogeneous Network 

We first focus on the node degree distribution N k . To determine its evolution, we shall write the 
rate equations that account for the change in the degree distribution after each node addition event. 
These equations contain complete information about the node degree, from which any measure of 
node degree (such as moments) can be easily extracted. For the GN growth process in which nodes 



are introduced one at a time, the rate equations for the degree distribution N k (t) are [22] 

dN k Afc_iiV fc _i - A k N k 



dt A 



+ S kl . (3.1) 



The first term on the right, A k -\N k ^i/A, accounts for processes in which a node with k — 1 
links is connected to the new node, thus increasing N k by one. Since there are N k -i nodes of 
degree k — 1, the rate at which such processes occur is proportional to A k _iN k _i, and the factor 
A(t) = ^2j>\ AjNj(t) converts this rate into a normalized probability. A corresponding role is 
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played by the second (loss) term on the right-hand side; Aj-Nk/A is the probability that a node 
with k links is connected to the new node, thus leading to a loss in Nk- The last term accounts for 
the introduction of new nodes with no incoming links. 

We start by solving for the time dependence of the moments of the degree distribution defined 
via M n (t) = ^2j>i3 n Nj{t). This is a standard method of analysis of rate equations by which one 
can gain partial, but valuable, information about the time dependence of the system with minimal 
effort. By explicitly summing Eqs. (3.1) over all k, we easily obtain Mo(t) = 1, whose solution is 



Mo(t) = Mq(0) + t. Notice that by definition Mo(t) = J2k^k is just the total number of nodes 
in the network. It is clear by the nature of the growth process that this quantity simply grows as 
t. In a similar fashion, the first moment of the degree distribution obeys M\{t) = 2 with solution 
M\{t) = Mi(0) + 2t. This time evolution for M\ can be understood either by explicitly summing 
the rate equations, or by observing that this first moment simply equals the total number of link 
endpoints. Clearly, this quantity must grow as 2t since the introduction of a single node introduces 
two link endpoints. Thus we find the simple result that the first two moments are independent of 
the attachment kernel A^ and grow linearly with time. On the other hand, higher moments and 
the degree distribution itself do depend in an essential way on the kernel A^. 

As a preview to the general behavior for the degree distribution, consider the strictly linear 
kernel ||, 22, 23], for which A(t) coincides with M\(t). In this case, we can solve Eqs. (3.1) for 



an arbitrary initial condition. However, since the long-time behavior is most interesting, we limit 
ourselves to the asymptotic regime (t — ► oo) where the initial condition is irrelevant. Using therefore 
Mi = 2t, we solve the first few of Eqs. fl3.1| ) directly and obtain N% = 2t/3, N2 = t/Q, etc. Thus 
each of the Nf. grow linearly with time. Accordingly, we substitute Nf.(t) = trik in Eqs. ( |3.1[) to 
yield the simple recursion relation = n^_\(k — l)/(k + 2). Solving for gives 

4 

nk ~ k{k + l){k + 2Y (3 - 2) 

Returning to the case of general attachment kernels, let us assume that the degree distribution 
and A(t) both grow linearly with time. This hypothesis can be easily verified numerically for 
attachment kernels that do not grow faster than linearly with k. Then substituting Nk(t) = tn^ 



and A(t) = /it into Eqs. (3.1) we obtain the recursion relation = Uk-iA^-i/ '(/i + A^) and 



n\ = /u/(/x + A\). Finally, solving for n^, we obtain the formal expression 




To complete the solution, we need the amplitude /i. Using the definition /i = J2j>i ^j n j in Eq. ( |3,3P , 
we obtain the implicit relation 

00 k / \ _1 

EnK =1 (3.4) 

which shows that the amplitude /i depends on the entire attachment kernel. 

For the generic case A^ ~ /c 7 , we substitute this form into Eq. ( |3.3| ) and then rewrite the 
product as the exponential of a sum of a logarithm. In the continuum limit, we convert this sum 
to an integral, expand the logarithm to lowest order, and then evaluate the integral to yield the 
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following basic results: 



k 7 exp 

k~ u , v>2, 
best seller 
bible 



7_2l-7 



< 7 < 1; 

7=1; 
1< 7 < 2; 
2 < 7. 



(3.5) 



Thus the degree distribution decays exponentially for 7 = 0, as in the case of the random 
graph, while for all < 7 < 1, the distribution exhibits robust stretched exponential behavior. The 
linear kernel is the case that has garnered much of the current research interest. As shown above, 
rik = 4/[k(k + + 2)] for the strictly linear kernel = k. One might anticipate that rik oc /c -3 
holds for all asymptotically linear kernels, ~ k. However, the situation is more delicate and 
the degree distribution exponent depends on microscopic details of A^. From Eq. fl3.3|) , we obtain 
n k ~ k~ v ', where the exponent v = 1 + fi can be tuned to any value larger than 2 |2^, |24j]. This 
non-universal behavior shows that one must be cautious in drawing general conclusions from the 
GN with a linear attachment kernel. 



i=4 




j=5 



Figure 3: A node with in-degree i = 4, out-degree j = 5, and total degree 9. 



As an illustrative example of the vagaries of asymptotically linear kernels, consider the shifted 
linear kernel A^ = k + uu. One way to motivate this kernel is to explicitly keep track of link 
directionality. In particular, the node degree for an undirected graph naturally generalizes to the 
in-degree and out-degree for a directed graph, the number of incoming and outgoing links at a 
node, respectively. Thus the total degree k in a directed graph is the sum of the in-degree i and 
out-degree j (Fig. |||). (More details on this model are given in the next section.) The most general 
linear attachment kernel for a directed graph has the form Aij = ai + bj. The GN corresponds 
to the case where the out-degree of any node equals one; thus j = 1 and k = i + 1. For this 
example the general linear attachment kernel reduces to A^ = a{k — 1) + b. Since the overall scale 
is irrelevant, we can re- write A^ as the shifted linear kernel A^ = k + w, with w = —1 + b/a that 
can vary over the range — 1 < uu < 00. 

To determine the degree distribution for the shifted linear kernel, note that A(t) = J2j AjNj(t) 
simply equals A(t) = M\(t) + wMo(t). Using A = jst, Mq = t and Mi = 2t, we get \i = 2 + w and 
hence the relation v = from the previous paragraph becomes v = 3+w. Thus a simple additive 
shift in the attachment kernel profoundly affects the asymptotic degree distribution. Furthermore, 
from Eq. ( |3.3[ ) we determine the entire degree distribution to be 

T(3 + 2w) T(k + w) fn „. 

Finally, we outline the intriguing behavior for super-linear kernels. In this case, there is a 
"runaway" or gelation-like phenomenon in which one node links to almost every other node. For 
7 > 2, all but a finite number of nodes are linked to a single node that has the rest of the links. 
We term such an overwhelmingly popular node as a "bible". For 1 < 7 < 2, the number of nodes 
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with a just a few links is no longer finite, but grows slower than linearly in time, and the remainder 
of the nodes are linked to an extremely popular node that we now term "best seller" . Full details 
about this runaway behavior are given in [22]. 



As a final parenthetical note, when the attachment kernel has the form A\~ oc A; 7 , with 7 < 0, 
there is preferential attachment to poorly-connected sites. Here, the degree distribution exhibits 
faster than exponential decay, n& oc k~^^ k ^ l \ When 7 < —2, the propensity for avoiding popularity 
is so strong that there is a finite probability of forming a "worm" graph in which each node attaches 
only to its immediate predecessor. 

3.2 Degree Distribution of a Heterogeneous Network 

A practically-relevant generalization of the GN is to endow each node with an intrinsic and per- 



manently defined "attractiveness" [25]. This accounts for the obvious fact that not all nodes are 
equivalent, but that some are clearly more attractive than others at their inception. Thus the 
subsequent attachment rate to a node should be a function of both its degree and its intrinsic 
attractiveness. For this generalization, the rate equation approach yields complete results with 
minimal additional effort beyond that needed to solve the homogeneous network. 

Let us assign each node an attractiveness parameter 77 > 0, with arbitrary distribution, at its 
inception. This attractiveness modifies the node attachment rate as follows: for a node with degree 
k and attractiveness rj, the attachment rate is simply A^rj). Now we need to characterize nodes 
both by their degree and their attractiveness - thus N/~(rj) is the number of nodes with degree k 
and attractiveness 77. This joint degree-attractiveness distribution obeys the rate equation, 

dNkirj) Ak-t^Nk^r]) - A k (r])N k (ri) 



dt A 



+ Po(v)$ki- (3.7) 



Here po(r]) is the probability that a newly-introduced node has attractiveness 77, and the normal- 
ization factor A = J drjJ2kAk(v)Nk(v)- 

Following the same approach as that used to analyze Eq. (|3.1| ), we substitute A = fit and 
n k(v) = tNk(rj) into Eq. (3.7) to obtain the recursion relation 



For concreteness, consider the linear attachment kernel Af~(rj) = r/fc. Then applying the same 
analysis as in the homogeneous network, we find 

^ rwr( 1 + iQ 

To determine the amplitude \i we substitute ( |3.9| ) into the definition \i = j drj J2k>i ^kiv) n k{v) 



and use the identity |26] 



gr(H«)_ r(u + i) 



fc= "i r(* + v) (v-u-l) T(v) 
to simplify the sum. This yields the implicit relation 

-1 

- .1 i 



1 = / drip ( V ) [ H - 1 ) . (3.10) 



6 



This condition on fi leads to two alternatives: If the support of r) is unbounded, then the integral 
diverges and there is no solution for fi. In this limit, the most attractive node is connected to a finite 
fraction of all links. Conversely, if the support of 77 is bounded, the resulting degree distribution is 
similar to that of the homogeneous network. For fixed rj, n^rf) ~ k~ u (^ with an attractiveness- 
dependent decay exponent ^(77) = 1+fx/rj. Amusingly, the total degree distribution n k = J drjnk(r]) 
is no longer a strict power law [p5|l . Rather, the asymptotic behavior is governed by properties of 
the initial attractiveness distribution near the upper cutoff. In particular, if po(i]) ~ (r/ max — r?) w_1 
(with uj > to ensure normalization), the total degree distribution exhibits a logarithmic correction 

n k ~ fc-(l+*«/»*n«) (\nky w . (3.11) 



3.3 Age Distribution 

In addition to the degree distribution, we determine when connections occur. Naively, we expect 
that older nodes will be better connected. We study this feature by resolving each node both by 
its degree and its age to provide a more complete understanding of the network evolution. Thus 
define Ck(t,a) to be the average number of nodes of age a that have k — 1 incoming links at time 
t. Here age a means that the node was introduced at time t — a. The original degree distribution 
may be recovered from the joint age-degree distribution through Nk(t) = Jq dack(t, a). 

For simplicity, we consider only the case of the strictly linear kernel; more general kernels were 



considered in Ref. [24]. The joint age-degree distribution evolves according to the rate equation 



d d\ ^fc-ic fe -i - A k c k . xxf , ,„ 10 , 

at + d-J Ck = 2t + 6kl6{a) - (3 - 12) 

The second term on the left accounts for the aging of nodes. We assume here that the probability 
of linking to a given node again depends only on its degree and not on its age. Finally, we again 
have used A(t) = Mi{t) ~ 2t for the linear attachment kernel in the long-time limit. 

The homogeneous form of this equation implies that solution should be self-similar. Thus we 
seek a solution as a function of the single variable a/t rather than two separate variables. Writing 
c k (t, a) = fk(x) with x = 1 — f, we convert Eq. ( |3.12| ) into the ordinary differential equation 

-2x^ = (k- l)/ fc _ x - kf k . (3.13) 

We omit the delta function term, since it merely provides the boundary condition Cfc(i, a = 0) = S^, 
or / fc (l) = S kl . 

The solution to this boundary-value problem may be simplified by assuming the exponential 
solution /fc = <&ip k ~ l \ this is consistent with the boundary condition, provided that $(1) = 1 
and (f(l) = 0. This ansatz reduces the infinite set of rate equations ( 3.13; ) into two elementary 



differential equations for (p(x) and <&(x) whose solutions are <p(x) = 1 — \fx and <£(x) = \fx. In 
terms of the original variables of a and t, the joint age-degree distribution is then 



k-l 



Cfe ( i ,a) = ^l-||l-^l-|| . (3.14) 

Thus the degree distribution for fixed-age nodes decays exponentially, with a characteristic 
degree that diverges as (k) ~ (1 — a/t)~ 1 / 2 for a — > t. As expected, young nodes (those with 
a/t — > 0) typically have a small degree while old nodes have large degree (Fig. |). It is the large 
characteristic degree of old nodes that ultimately leads to a power-law total degree distribution 
when the joint age-degree distribution is integrated over all ages. 
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Figure 4: Age-dependent degree distribution for the GN for the linear attachment kernel. Low- 
degree nodes tend to be relatively young while high-degree nodes are old. The inset shows detail 
for a/t > 0.98. 



3.4 Node Degree Correlations 

The rate equation approach is sufficiently versatile that we can also obtain much deeper geometrical 
properties of growing networks. One such property is the correlation between degrees of connected 



nodes [24|. These develop naturally because a node with large degree is likely to be old. Thus its 
ancestor is also old and hence also has a large degree. In the context of the web, this correlation 
merely expresses that obvious fact that it is more likely that popular web sites have hyperlinks 
among each other rather than to marginal sites. 

To quantify the node degree correlation, we define Cki(t) as the number of nodes of degree k 
that attach to an ancestor node of degree I (Fig. ||). For example, in the network of Fig. [I], there 
are iVj = 6 nodes of degree 1, with C\i = C13 = C15 = 2. There are also N2 = 2 nodes of degree 
2, with C25 = 2, and N$ = 1 nodes of degree 3, with C35 = 1. 




k I 

Figure 5: Definition of the node degree correlation C^i for the case k = 3 and I = 4. 



For simplicity, we again specialize to the case of the strictly linear attachment kernel. More 
general kernels can also be treated within our general framework |p4]| . For the linear attachment 
kernel, the degree correlation Cki(t) evolves according to the rate equation 

M 1 ^ = (k- lJCfc-i,, - kC u + (Z - 1)C* M _! - lC kl + (Z - 1)CU S kl . (3.15) 

The processes that gives rise to each term in this equation are illustrated in Fig. ||. The first two 
terms on the right account for the change in Cti due to the addition of a link onto a node of degree 
k — 1 (gain) or k (loss) respectively, while the second set of terms gives the change in Cki due to 
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the addition of a link onto the ancestor node. Finally, the last term accounts for the gain in C\\ 
due to the addition of a new node. 




(i) (ii) (iii) (iv) (v) 

Figure 6: The processes that contribute ((i)-(v) in order) to the various terms in the rate equation 
( 3.15| ). The newly-added node and link are shown dashed. 



As in the case of the node degree, the time dependence can be separated as Cki = tcjy. This 
reduces Eqs. ( 3. 15) ) to the time-independent recursion relation, 

(k + l + 2)c M = (k- l)c fc _!,j + (I - l)c M _i + (Z - l)cj_i S kl . (3.16) 

This can be further reduced to a constant-coefficient inhomogeneous recursion relation by the 
substitution 



Ckl 



T(k + l + 3) 



dkl 



to yield 



dkl = 4-lJ + 4,/-l + 4(Z + 2)5, 



ki ■ (3.17) 

Solving Eqs. ( |3.17| ) for the first few k yields the pattern of dependence on k and I from which one 
can then infer the solution 



ki 



T{k + l) 



T{k + 2) T(Z - 1) 



12 



T{k + l - 1) 

r(fc + i)r(/-i) 



from which we ultimately obtain 



Ckl 



4(Z - 1) 



A;(fc + Z)(A; + Z + l)(A; + Z + 2) 



fc + 1 k+l-1 



(3.18) 



(3.19) 



The important feature of this result is that the joint distribution does not factorize, that is, Cki ^ 
nkjii. This correlation between the degrees of connected nodes is an important distinction between 
the GN and classical random graphs. 

While the solution of Eq. ( 3.19Q is unwieldy, it greatly simplifies in the scaling regime, k — > oo 
and I — > oo with y = l/k finite. The scaled form of the solution is 



Ckl = k~ 



4y(y + 4) 
(i + y) 4 ' 



(3.20) 



For fixed large k, the distribution Cki has a single maximum at y* = (y33 — 5)/2 = 0.372. Thus 
a node whose degree k is large is typically linked to another node whose degree is also large; the 
typical degree of the ancestor is 37% that of the daughter node. In general, when k and I are both 
large and their ratio is different from one, the limiting behaviors of cm are 



Ckl 



/ 16 (l/k 5 ) 
I 4/(fc 2 I 2 ) 



I < k, 
I > k. 



(3.21) 



Here we explicitly see the absence of factorization in the degree correlation: cm ^ n^n; oc (kl)~ 
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3.5 Global Properties 



In addition to elucidating the degree distribution and degree correlations, the rate equations can be 
applied to determine global properties. One useful example is the out-component with respect to a 
given node x - this is the set of nodes that can be reached by following directed links that emanate 
from x (Fig. ^). In the context of the web, this is the set of nodes that are reached by following 
hyperlinks that emanate from a fixed node to target nodes, and then iteratively following target 
nodes ad infinitum. In a similar vein, one may enumerate all nodes that refer to a fixed node, plus 
all nodes that refer these daughter nodes, etc. This progeny comprises the in-component to node 
x - the set from which x can be reached by following a path of directed links. 




Figure 7: In-component and out-components of node x. 



3.5.1 The In-Component 

For simplicity, we study the in-component size distribution for the GN with a constant attachment 
kernel, Aj. = 1. We consider this kernel because many results about network components are 
independent of the form of the kernel and thus it suffices to consider the simplest situation; the 
extension to more general attachment kernels is discussed in ||24|| , 

For the constant attachment kernel, the number I s (t) of in-components with s nodes satisfies 
the rate equation 

dl s (s - l)J s _i - sl s 

-dT = a + 6sl - (3 - 22) 

The loss term accounts for processes in which the attachment of a new node to an in-component 
of size s increases its size by one. This gives a loss rate that is proportional to s. If there is 
more than one in-component of size s they must be disjoint, so that the total loss rate for I s (t) is 
simply sl s (t). A similar argument applies for the gain term. Finally, dividing by A(t) = J2j AjNj(t) 
converts these rates to normalized probabilities. For the constant attachment kernel, A(t) = Mo(t), 
so asymptotically A = t. Interestingly, Eq. (|3.22| ) is almost identical to the rate equations for the 
degree distribution for the GN with linear attachment kernel, except that the prefactor equals t~ l 
rather than (2t)~ 1 . This change in the normalization factor is responsible for shifting the exponent 
of the resulting distribution from —3 to —2. 

To determine I s (t), we again note, by explicitly solving the first few of the rate equations, that 
each I s grows linearly in time. Thus we substitute I s (t) = ti s into Eqs. ( |3.22| ) to obtain i\ = 1/2 
and i s = i s -i{s — l)/(s + 1). This immediately gives 
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This s 2 tail for the in-component distribution is a robust feature, independent of the form of the 
attachment kernel |p4| . This s~ 2 tail also agrees with recent measurements of the web [jlOj] . 

3.5.2 The Out-Component 

The complementary out-component from each node can be determined by constructing a mapping 
between the out-component and an underlying network "genealogy" . We build a genealogical tree 
for the GN by taking generation g = to be the initial node. Nodes that attach to those in 
generation g form generation g + 1; the node index does not matter in this characterization. For 
example, in the network of Fig. |l], node 1 is the "ancestor" of 6, while 10 is the "descendant" of 6 
and there are 5 nodes in generation g = 1 and 4 in g = 2. This leads to the genealogical tree of 
Fig. |. 




Figure 8: Genealogy of the network in Fig. |]. The nodes indices indicate when each is introduced. 
The nodes are also arranged according to generation number. 



The genealogical tree provides a convenient way to characterize the out-component distribution. 
As one can directly verify from Fig. ^, the number O s of out-components with s nodes equals L s — l, 
the number of nodes in generation s — 1 in the genealogical tree. We therefore compute L g (t), the 
size of generation g at time t. For this discussion, we again treat only the constant attachment 
kernel and refer the reader to Ref. |24j] for more general attachment kernels. We determine L g (t) 
by noting that L g (t) increases when a new node attaches to a node in generation g — 1. This occurs 
with rate L 9 _i/Mo, where Mq{€) = 1 + t is the number of nodes. This gives the differential equation 
for L g (t) = L g -i/(l + 1) with solution L g {r) = T 9 /g\, where r = ln(l + t). Thus the number O s of 
out-components with s nodes equals 

O s (r) = r s - x /{s - 1)!. (3.24) 

Note that the generation size L g (t) grows with g, when g < r, and then decreases and becomes 
of order 1 when g = er. The genealogical tree therefore contains approximately er generations at 
time t. This result allows us to determine the diameter of the network, since the maximum distance 
between any pair of nodes is twice the distance from the root to the last generation. Therefore 
the diameter of the network scales as 2er ~ 2elnA^; this is the same dependence on iV as in the 
random graph |2], 0]. More importantly, this result shows that the diameter of the GN is always 
small - ranging from the order of lniV for a constant attachment kernel, to the order of one for 
super-linear attachment kernels. 

4 The Web Graph 

In the world-wide web, link directionality is clearly relevant, as hyperlinks go from an issuing 
website to a target website but not vice versa. Thus to characterize the local graph structure more 
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fully, the node degree should be resolved into the in-degree - the number of incoming links to a 
node, and the complementary out-degree (Fig. ||). Measurements on the web indicate that these 
distributions are power laws with different exponents [11|. These properties can be accounted for 
by the web graph (WG) model (Fig. ^) and the rate equations provide an extremely convenient 
analysis tool. 



4.1 Average Degrees 

Let us first determine the average node degrees (in-degree, out-degree, and total degree) of the 
WG. Let N(t) be the total number of nodes, and I(t) and J(t) the in-degree and out-degree of the 
entire network, respectively. According to the elemental growth steps of the model, these degrees 
evolve by one of the following two possibilities: 

f (N + 1, 1 + 1, J + 1) with probability p, 
[I\,1,J) | (jv,J + 1, J+l) with probability q. 

That is, with probability p a new node and new directed link are created (Fig. |2|) so that the number 
of nodes and both the total in- and out-degrees increase by one. Conversely, with probability q 
a new directed link is created and the in- and out-degrees each increase by one, while the total 
number of nodes is unchanged. As a result, N(t) = pt, and I(t) = J(t) = t. Thus the average in- 
and out-degrees, V m = I(t)/N(t) and V out = J(t)/N(t), are both equal to 1/p. 



4.2 Degree Distributions 

To determine the degree distributions, we need to specify: (i) the attachment rate A(i,j), defined 
as the probability that a newly-introduced node links to an existing node with i incoming and j 
outgoing links, and (ii) the creation rate C(ii,ji\i2,j2), defined as the probability of adding a new 
link from a node to a {12,32) node. We will use rates that are expected to occur in the 

web. Clearly, the attachment and creation rates should be non-decreasing in % and 3. Moreover, 
it seems intuitively plausible that the attachment rate depends only on the in-degree of the target 
node, A(i,j) = Af, i.e., a website designer decides to create link to a target based only on the 
popularity of the latter. In the same spirit, we take the link creation rate to depend only on the 
out-degree of the issuing node and the in-degree of the target node, C{i\, ji\i2, 32) = C(Jlj*2)- The 
former property reflects the fact that the development rate of a site depends only on the number 
of outgoing links. 

The interesting situation of power-law degree distributions arises for asymptotically linear rates, 
and we therefore consider 



As = i + Al- 



and 



C(j,i) = (i + A in )(j + A out ) 



(4.1) 

1 to ensure that the 



The parameters A; n and A ou t must satisfy the constraint A; n > and A out > 
rates are positive for all attainable in- and out-degree values, i > and j > 1. 

With these rates, the joint degree distribution, Nij(t), defined as the average number of nodes 
with i incoming and j outgoing links, evolves according to 



dN { , 
~dT 



(p + q) 

+q 



(t - 1 + \in)Ni- hj - (i + \ in )N, 



1,1 



I + A in AT 
U - 1 + A out )A^-i - (j + \out)N tj 



(4.2) 



J + A out 



+ pS i0 S 
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The first group of terms on the right accounts for the changes in the in-degree of target nodes by 
simultaneous creation of a new node and link (probability p) or by creation of a new link only 
(probability q). For example, the creation of a link to a node with in-degree i leads to a loss in 
the number of such nodes. This occurs with rate (p + ?)(« + Ai n )iVy, divided by the appropriate 
normalization factor J2i,j(^ + Mn)^ij = I + M^N. The factor p + q = 1 in Eq. Q4.2| ) is explicitly 
written to make clear these two types of processes. Similarly, the second group of terms account for 
out-degree changes. These occur due to the creation of new links between already existing nodes - 
hence the prefactor q. The last term accounts for the introduction of new nodes with no incoming 
links and one outgoing link. As a useful consistency check, one may verify that the total number of 
nodes, N = J2i,j ^iji grows according to N = p, while the total in- and out-degrees, / = J2i,j ^ij 
and J = J2i,j 3 N ij, obey j = j = 1. 

By solving the first few of Eqs. ( [4.2|) , it is again clear that the Nij grow linearly with time. 
Accordingly, we substitute Nij{t) = triij, as well as N = pt and I = J = t, into Eqs. (4.2) to yield 
a recursion relation for rtjj. Using the shorthand notations, 

1 + PK 



1 +p\ 



out 



and b= 1 + (1 +p)A i: 



the recursion relation for riij is 

[i + a(j + A out ) + bjriij = (i - 1 + A in )ni_ij + a(j - 1 + A ut)"i,j-i +K 1 + pA in )^ ^i. (4.3) 

The in-degree and out-degree distributions are straightforwardly expressed through the joint dis- 
tribution: Ii(t) = J2j Nij(t) and Oj(t) = J2iNij(t). Because of the linear time dependence of the 
node degrees, we write Ii(t) = tli and Oj(t) = tOj. The densities Ii and Oj satisfy 

(i + b)Ii = (i - 1 + X in )Ii-i + p(l + p\i n )6io, (4.4a) 

j + \ + ° j = & " 1 + UOj-i+P ^^ ^i. ( 4 - 4& ) 

respectively. The solution to these recursion formulae may be expressed in terms of the following 
ratios of gamma functions 

_ r(, + A in )r(6 + i) 
h - Io WTbTTmx-y (45a) 

J 1 r(j + l + g- 1 + A out g- 1 )r(l + A out )' V ' 

with I = p(l +pK l )/b and Oi = p(l +pA out )/(l + g + A out ). 

From the asymptotics of the gamma function, the asymptotic behavior of the in- and out-degree 
distributions have the distinct power law forms [19|, 

Ii~i~ Uin , w m = 2 + p\- m , (4.6a) 

Oj ~ r Vout , ^out = l+q- 1 + Xout pq~\ (4.66) 

with and v on t both necessarily greater than 2. Let us now compare these predictions with 
current data for the web [11]. First, the value of p is fixed by noting that p~ l equals the average 
degree of the entire network. Current data for the web gives V m = V out w 7.5, and thus we set 
p^ 1 = 0.75. Now Eqs. ([L"q) contain two free parameters and by choosing them to be Aj n = 0.75 and 
Aout = 3.55 we reproduced the observed exponents for the degree distributions of the web, u- m 2.1 
and f ut ~ 2.7, respectively. The fact that the parameters Aj n and A ou t are of the order of one 
indicates that the model with linear rates of node attachment and bilinear rates of link creation is 
a viable description of the web. 
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5 Multicomponent Graph 



In addition to the degree distributions, current measurements indicate that the web consists of 
a "giant" component that contains approximately 91% of all nodes, and a large number of finite 



components [11]. The models discussed thus far are unsuited to describe the number and size 
distribution of these components, since the growth rules necessarily produce only a single connected 
component. In this section, we outline a simple modification of the WG, the multicomponent graph 
(MG), that naturally produces many components. In this example, the rate equations now provide 
a comprehensive characterization for the size distribution of the components. 

In the MG model, we simply separate node and link creation steps. Namely, when a node 
is introduced it does not immediately attach to an earlier node, but rather, a new node begins 
its existence as isolated and joins the network only when a link creation event reaches the new 
node. For the average network degrees, this small modification already has a significant effect. The 
number of nodes and the total in- and out-degrees of the network, N, I, J now increase with time 
as N = pt and I = J = qt. Thus the in- and out-degrees of each node are time independent and 
equal to qp -1 , while the total degree is V = 2q/p. 

As in the case of the WG model, we study the case of a bilinear link creation rate given in 
Eq. (|4.1|) , with now Ai n ,A ou t > to ensure that C(j,i) > for all permissible in- and out-degrees, 
i > and j > 0. 

5.1 Local Properties 

We study local characteristics by employing the same approach as in the WG model. We find that 
results differ only in minute details, e.g., the in- and out-degree densities Ii and Oj are again the 
ratios of gamma functions, and the respective exponents are 

^ = 2(l + ^), *out = 2 (l + ^H*) . (5.1) 

Notice the decoupling - the in-degree exponent is independent of A ou t 5 while u out is independent 
of Aj n . The expressions ( |5.1[) are neater than their WG counterparts, reflecting the fact that the 
governing rules of the MG model are more symmetric. 

To complement our discussion, we now outline the asymptotic behavior of the joint in- and 
out-degree distribution. Although this distribution defies general analysis, we can obtain partial 
and useful information by fixing one index and letting the other index vary. An elementary but 
cumbersome analysis yields following limiting behaviors 



with 



V (f in - l)(z/ ou t 



(5.2) 



2 U out - 1 

c . T> (z/ out - l)(fi, 

?out — V on t + — 



2 v m -\ 

We also can determine the joint degree distribution analytically in the subset of the parameter 
space where v m = f out , i.e., \ m = A ou t- In what follows, we therefore denote A; n = A ou t = A. The 
resulting recursion equation for the joint degree distribution is 

(i + j + 1 + A + Xq'^riij = (i— 1 + A)nj_ij + (j - 1 + \)n i j_ 1 + cSi fi 8j fi , (5.3) 
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with c = p(l + 2X/V). Because the degrees i and j appear in Eq. ( [5.3D with equal prefactors, the 
substitution 

r(t + A) T(j + A) 



y r(i + j + 2 + A + A (? - 1 ) J 
reduces Eqs. (|5.3|) into the constant-coefficient recursion relation 

r(i + A + A (? - 1 ) 



-rrii-ij +TTHJ-1+ fiSifiSj^, with /x = c . (5.4) 



We solve Eq. ( |5.4D by employing the generating function technique. Multiplying Eq. (|5.4j) by 
and summing over all i,j > 0, we find that the generating function J\4(x,y) = J2i,j>o m ij xl U : ' 
equals — x — y). Expanding M. (x, y) in x yields \i x 1 / (1 — y) l+1 which we then expand in y 
by employing the identity (1 — = J2j>o Ci)^- Finally, we arrive at 

_ r(i + j + i) 
m ^- M r(* + i)r(j + i)' (5 ' 5) 

from which the joint degree distribution is 

fl T(i + X)T(j + X)T(i+j + l) (ij)^ 1 . . 

' - ' V- / • , .,uxxw n . as «,J^oo- (5-6) 



JJ r(i + i)r(j + i)r(i + j + 2 + A + A (/ - 1 ) ^ (t + j)i+A+A/. 

Thus again, the in- and out-degrees of a node are correlated: riij ^ hOj ~ i u j 1 ' 



5.2 Global Properties 

Let us now turn now to the distribution of connected components (clusters, for brevity). For 
simplicity, we consider models with undirected links. Let us first estimate the total number of 
clusters N . At each time step, N — > N + 1 with probability p, or J\f — > N — 1 with probability q. 
This implies 

N=(p-q)t. (5.7) 

The gain rate of M is exactly equal to p, while in the loss term we ignore self-connections and 
tacitly assume that links are always created between different clusters. In the long-time limit, 
self-connections should be asymptotically negligible when the total number of clusters grows with 
time and no macroscopic clusters (i.e., components that contain a finite fraction of all nodes) arise. 

This assumption of no self-connections greatly simplifies the description of the cluster merging 
process. Consider two clusters (labeled by a = 1,2) with total in-degrees i a , out-degrees j a , and 
number of nodes k a . When these clusters merge, the combined cluster is characterized by 

i = k + n + 1, j = Ji +h + 1, k = h + k 2 . 

Thus starting with single-node clusters with (i,j,k) = (0,0,1), the above merging rule leads to 
clusters that always satisfy the constraint i = j = k — 1. Thus the size k characterizes both the 
in-degree and out-degree of clusters. 

To simplify formulae without sacrificing generality, we consider the link creation rate of Eq. Q4. 1|) . 
with A; n = A ou t = 1. Then the merging rate W(ki,k2) of the two clusters is proportional to 

(k + fel)(j2 + fa) + (*2 + ^2)0*1 + k l)i or 

W(k u k 2 ) = (2h - l)(2k 2 - 1). 
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Let C(k,t) denotes the number of clusters of mass k. This distribution evolves according to 



^7^ = 1 E (2k 1 -l)(2k 2 -l)C(k 1 ,t)C(k 2 ,t)-^(2k-l)C(k,t)+ P 5 k , 1 , (5.8) 
at 1 k 1+ k 2 =k f 

The first set of terms account for the gain in C(k,t) due to the coalescence of clusters of size k\ 
and k 2 , with k\ + k 2 = k. Similarly, the second set of terms accounts for the loss in C(k,t) due to 
the coalescence of a cluster of size k with any other cluster. The last term accounts for the input of 
unit-size clusters. These rate equations are similar to those of irreversible aggregation with product 
kernel [15]. The primary difference is that we explicitly treat the number of clusters as finite. 



One can verify that the total number of nodes N(t) = J2kC(k,t) grows with rate p and that 
the total number of clusters Af(t) = J2^(k, t) grows with rate p — q, in agreement with Eq. fl5.7p . 
Solving the first few Eqs. (|5.8[) shows again that C(k, t) grow linearly with time. Accordingly, we 
substitute C(k,t) = tc k into Eqs. (|5.8| ) to yield the time- independent recursion relation 



c k = q ^h-l)(2k 2 -l)c kl c k2 -2q(2k-l)c k + P 6 k>1 . (5.9) 

ki+k 2 =k 

A giant component, i.e., a cluster that contains a finite fraction of all the nodes, emerges when 
the link creation rate exceeds a threshold value. To determine this threshold, we study the moments 
of the cluster size distribution M n = 2fe>i ^™ c fc- We already know that the first two moments 
are Mq = p — q and M\ = p. We can obtain an equation for the second moment by multiplying 
Eq. ( |5.9| ) by k 2 and summing over k > 1 to give M 2 = 2q{2M 2 — .Mi) 2 + p. When this equation 
has a real solution, M 2 is finite. The solution is 

M 2 = 1 + 8pq ~ V1 ~ mm (5.10) 
16q 

and gives, when 1 — 16pq = 0, to a threshold value p c = (2 + \/3)/4. For 1 — 16pq > (p > p c ) all 
clusters have finite size and the second moment is finite. 

In this steady-state regime, we can obtain the cluster size distribution by introducing the gen- 
erating function C(z) = Y^=i c k zk to convert Eq. (5.9) into the differential equation 



2zC'(z) - C(z) = 1 - y/l - [pz-C(z)]/q. (5.11) 



The asymptotic behavior of the cluster size distribution can now be read off from the behavior of 
the generating function in the z — ► 1 limit. In particular, the power-law behavior 

Cfc ~ — as k — > oo (5.12) 

k T 

implies that the corresponding generating function has the form 

C(z) =Mo+M 1 {z-l) + M i- Ml ( z _ i)2 + sr(1 _ T )(i _ z y-i + . . . . (5.13) 



Here the asymptotic behavior is controlled by the dominant singular term (1 — z) T . However, there 
are also subdominant singular terms and regular terms in the generating function. In Eq. ( |5.13 ) we 



explicitly included the three regular terms which ensure that the first three moments of the cluster- 
size distribution are correctly reproduced, namely, C(l) = Mo, C'(l) = Mi, and C"(l) = M 2 — M\. 
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Finally, substituting Eq. Q5.13 ) into Eq. ( |5.11| ) we find that the dominant singular terms are of 
the order of (1 — z) T ~ 2 . Balancing all contributions of this order in the equation determines the 
exponent of the cluster size distribution to be 



1 + 



1 — y/1 — 16pq 



(5.14) 



This exponent satisfies the bound r > 3 and thus justifies using the behavior of the second moment 
of the size distribution as the criterion to find the threshold value p c . 

For p > p c there is no giant cluster and the cluster size distribution has a power-law tail with 
r given by Eq. (5.14). Intriguingly, the power-law form holds for any value p > p c . This is in 
stark contrast to all other percolation-type phenomena, where away from the threshold, there is an 
exponential tail in cluster size distributions p7| . Thus in contrast to ordinary critical phenomena, 
the entire range p > p c is critical. 

As a corollary to the power-law tail of the cluster size distribution for p > p c , we can estimate 
the size of the largest cluster /c max to see how "finite" it really is. Using the extreme statistics 
criterion £fc>fc max N c k = 1 we obtain A: max ~ iV 1 /^ 1 ), 



or 



^(l-Vi-iepffJA 



(5.15) 



This is very different from the corresponding behavior on the random graph, where below the 
percolation threshold the largest component scales logarithmically with the number of nodes. Thus 
for the random graph, the dependence of fc max (iV) changes from In iV just below, to N, just above 
the percolation threshold; for the MG, the change is much more gentle: from N 1 / 2 to N. 

These considerations suggest that the phase transition in the MG is dramatically different from 



the percolation transition. Very recently, simplified versions of the MG were studied [21, 28, 29 



|3~i~|| . Numerical [21] and analytical |2^, |3(], 31] evidence suggest that the size of the giant component 
G(p) near the threshold scales as 



G(p) oc exp 



const. 
y/Pc-P 



(5.16) 



Therefore, the phase transition of this dynamically grown network is of infinite order since all 
derivatives of G{p) vanish as p —* p c . In contrast, static random graphs with any desired degree 
distribution 



exhibit a standard percolation transition [21, w 



6 Summary 

In this paper, we have presented a statistical physics viewpoint on growing network problems. 
This perspective is strongly influenced by the phenomenon of aggregation kinetics, where the rate 
equation approach has proved extremely useful. From the wide range of results that we were 
able to obtain for evolving networks, we hope that the reader appreciates both the simplicity and 
the power of the rate equation method for characterizing evolving networks. We quantified the 
degree distribution of the growing network model and found a diverse range of phenomenology that 
depends on the form of the attachment kernel. At the qualitative level, a stretched exponential 
form for the degree distribution should be regarded as "generic" , since it occurs for an attachment 
kernel that is sub-linear in node degree (e.g., A/. ~ k 1 with 7 < 1). On the other hand, a power-law 
degree distribution arises only for linear attachment kernels, A^ ~ k. However, this result is "non- 
generic" as the degree distribution exponent now depends on the detailed form of the attachment 
kernel. 
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We investigated extensions of the basic growing network to incorporate processes that naturally 
occur in the development in the web. In particular, by allowing for link directionality, the full degree 
distribution naturally resolves into independent in-degree and out-degree distributions. When the 
rates at which links are created are linear functions of the in- and out-degrees of the terminal 
nodes of the link, the in- and out-degree distributions are power laws with different exponents, 
fin and fout, that match with current measurements on the web with reasonable values for the 
model parameters. We also considered a model with independent node and link creation rates. 
This leads to a network with many independent components and now the size distribution of these 
components is an important characteristic. We have characterized basic aspects of this process by 
the rate equation approach and showed that the network is in a critical state even away from the 
percolation threshold. The rate equation approach also provides evidence of an unusual, infinite- 
order percolation transition. 

While statistical physics tools have fueled much progress in elucidating the structure of growing 
networks, there are still many open questions. One set is associated with understanding dynamical 
processes in such networks. For example, what is the nature of information transmission? What 
governs the formation of traffic jams on the web? Another set is concerned with growth mechanisms. 
While we can make much progress in characterizing networks with idealized growth rules, it is 
important to understand the actual rules that govern the growth of the Internet. These issues 
appear to be fruitful challenges for future research. 
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